School pupils - grades in primary school and status for secondary education

The script below demonstrates how to follow student cohorts through the education system from primary school to upper secondary education.

In this specific example, we look at school pupils who completed primary school in 2018 and who lived in Trøndelag. Statistics are made on grades in primary school at various levels and in selected subjects, and research is then carried out on how to carry out secondary education. Among other things, a table is created at the end which summarizes the relationship between total grade points from primary school and the number of semesters one has spent before completing upper secondary education for the first time.

Microdata.no also has a lot of data at college and university level, but it becomes difficult to follow student cohorts over the entire course of education due to the history lenght that extends further than what data is available for. But it is entirely possible to run analyzes of cohorts from e.g. upper secondary school and up to completion of education, or as here - from primary school to upper secondary school.

 textblock
Note that the 2018 cohort is the most recent cohort that can be observed if you want a full history up to and including high school. This allows you to also measure beyond the year you normally complete high school. However, this means that you cannot access data for national tests for 5th graders (applies to the 2019 cohorts and earlier). So here you have to weigh what is most important, to include all national tests or to include data for high school.

As new updates are added by SSB, you will be able to follow the entire history from national tests for 5th grade and out of high school.

Note that there is also missing data for national tests in English for 9th grade.
endblock

//------ Creating population (persons residing in Trøndelag who completed primary school in 2018) ------

require no.ssb.fdb:30 as db

create-dataset students
import db/NUDB_AAR_FORSTE_FULLF_GS as year_completed_ps
import db/NUDB_KOMM_16 as residence16
keep if substr(residence16,1,2) == '50' & year_completed_ps == 2018


//------ Primary school: Norwegian/reading ------
import db/NUDB_NP_SKALAPOENG_NPLES05 as np_5_reading //Data not available for 2019 cohort and earlier
import db/NUDB_NP_SKALAPOENG_NPLES08 as np_8_reading
import db/NUDB_NP_SKALAPOENG_NPLES09 as np_9_reading
import db/NUDB_GS_STP_NOH as stp_ps_norwegian_main
import db/NUDB_GS_STP_NOM as stp_ps_norwegian_main_oral

summarize np_5_reading np_8_reading np_9_reading
//histogram np_5_reading, percent
histogram np_8_reading, percent
histogram np_9_reading, percent

tabulate stp_ps_norwegian_main, missing
destring stp_ps_norwegian_main, force
summarize stp_ps_norwegian_main

tabulate stp_ps_norwegian_main_oral, missing
destring stp_ps_norwegian_main_oral, force
summarize stp_ps_norwegian_main_oral

//boxplot np_5_reading, over(stp_ps_norwegian_main)
boxplot np_8_reading, over(stp_ps_norwegian_main)
boxplot np_9_reading, over(stp_ps_norwegian_main)
histogram stp_ps_norwegian_main, discrete
//histogram np_5_reading, by(stp_ps_norwegian_main) normal
histogram np_8_reading, by(stp_ps_norwegian_main) normal
histogram np_9_reading, by(stp_ps_norwegian_main) normal
tabulate stp_ps_norwegian_main, cellpct freq missing
//tabulate stp_ps_norwegian_main, summarize(np_5_reading) missing
tabulate stp_ps_norwegian_main, summarize(np_8_reading) missing
tabulate stp_ps_norwegian_main, summarize(np_9_reading) missing
barchart(mean) np_8_reading np_9_reading
barchart(mean) np_8_reading np_9_reading, over(stp_ps_norwegian_main)


//------ Primary school: Math ------
import db/NUDB_NP_SKALAPOENG_NPREG05 as np_5_math //Data not available for the 2019 cohort and earlier
import db/NUDB_NP_SKALAPOENG_NPREG08 as np_8_math
import db/NUDB_NP_SKALAPOENG_NPREG09 as np_9_math
import db/NUDB_GS_STP_MAT as stp_ps_math

summarize np_5_math np_8_math np_9_math
//histogram np_5_math, percent
histogram np_8_math, percent
histogram np_9_math, percent

tabulate stp_ps_math, missing
destring stp_ps_math, force
summarize stp_ps_math

//boxplot np_5_math, over(stp_ps_math)
boxplot np_8_math, over(stp_ps_math)
boxplot np_9_math, over(stp_ps_math)
histogram stp_ps_math, discrete
//histogram np_5_math, by(stp_ps_math) normal
histogram np_8_math, by(stp_ps_math) normal
histogram np_9_math, by(stp_ps_math) normal
tabulate stp_ps_math, cellpct freq missing
//tabulate stp_ps_math, summarize(np_5_math) missing
tabulate stp_ps_math, summarize(np_8_math) missing
tabulate stp_ps_math, summarize(np_9_math) missing
barchart(mean) np_8_math np_9_math
barchart(mean) np_8_math np_9_math, over(stp_ps_math)


//------ Primary school: English ------
import db/NUDB_NP_SKALAPOENG_NPENG05 as np_5_english //Data not available for the 2019 cohort and earlier
import db/NUDB_NP_SKALAPOENG_NPENG08 as np_8_english
//National tests, 9th grade: Data not available
import db/NUDB_GS_STP_ENS as stp_ps_english

summarize np_5_english np_8_english
//histogram np_5_english, percent
histogram np_8_english, percent

tabulate stp_ps_english, missing
destring stp_ps_english, force
summarize stp_ps_english

//boxplot np_5_english, over(stp_ps_english)
boxplot np_8_english, over(stp_ps_english)
histogram stp_ps_english, discrete
//histogram np_5_english, by(stp_ps_english) normal
histogram np_8_english, by(stp_ps_english) normal
tabulate stp_ps_english, cellpct freq missing
//tabulate stp_ps_english, summarize(np_5_english) missing
tabulate stp_ps_english, summarize(np_8_english) missing
barchart(mean) np_8_english
barchart(mean) np_8_english, over(stp_ps_english)


//------ Primary school: Primary school points (total grade points at the end of primary school) ------
import db/NUDB_KURS_GRPOENG as ps_points
summarize ps_points
histogram ps_points, percent


// ------ High School ------
//Note that the last update for the 2018 cohort is the year after normal time. Therefore, you do not capture those who complete two years after normal time or more.

import db/NUDB_AAR_FORSTE_FULLF_VS as date_completed_hs  //Date first completed hs - no updated data for the 2020 cohort and later 
import db/NUDB_SEMESTER_FFF_VS as num_semesters_hs       //Num semesters used before first completed hs - no updated data for the 2020 cohort and later
import db/NUDB_VGS_STP_NOR1231 as stp_hs_norwegian          //Stp character Norwegian main written - no updated data for the 2020 cohort and later

summarize year_completed_ps date_completed_hs num_semesters_hs

tabulate num_semesters_hs, summarize(ps_points) missing //Here, you measure the total grade points from primary school divided by the number of semesters used before first completing high school. Missing includes those who have not yet completed at the last data update or who have dropped out.